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he introduction of DSPs (digital 

signal processors) has contribut- 

ed immeasurably to speech-based 

applications. DSP power is used 
in many areas, including transmission- 
noise reduction; signal amplification; 
speech synthesis for text-to-speech con- 
versions; speech recognition; and voice- 
message coding. 

In text-to-speech conversions, the 
DSP processes ASCII text, generates 
a phonetic transcription, and produces 
the synthetic speech. In speech recog- 
nition, the DSP system, in conjunction 
with an A/D converter, acquires the 
speech, compares it to stored templates, 
and indicates what word was uttered. 
The applications determine how to pro- 
cess the recognition results. 


Voice Coding 

Coding has gained wide use in voice- 
mail storage. The idea is to let you re- 
cord messages to a hard disk for future 
retrieval. Such voice messages, or even 
voice-annotated documents, can also 
be sent over networks. On request, you 
can retrieve, decode, and play back 
these messages. 

Coding rates provide great savings 
in storage space, an important factor in 
applications such as dictation, voice 
annotation, PC-based automatic answer- 
ing machines, voice mail, and digital 
telephone-answering machines. An un- 


coded voice file typically requires about 
0.5 MB of memory for | minute of 
recorded speech. Using a DSP on the 
motherboard or on an add-in board can 
reduce the memory required by as 
much as 85 percent. And coding algo- 
rithms are a must when documents con- 
taining voice annotations are sent over 
a network or through a modem. 


The SBCELP Algorithm 

A number of different types of voice- 
coding algorithms are available. Lern- 
out & Hauspie Speech Products of Bel- 
gium introduced a coding technique 
called SBCELP, which is based on a 
CELP (code-excited linear prediction). 
It performs coding and decoding of 
speech signals at fixed bit rates in the 
range of 2000 to 10,000 bps. After cod- 
ing, the memory requirement for 1 
minute of speech is reduced to a range 
of 15 to 75 KB. This is as much as a 
30-fold savings in storage space over 
unencoded speech. 

The SBCELP algorithm consists of 
three major parts, each one correspond- 
ing to a section of the human speech 
production system. In the first part, the 
STP (short-term prediction) analysis 
extracts the envelope of the input sig- 
nal. This is performed via a tenth-or- 
der LPC (linear prediction coding) fil- 
ter. The envelope corresponds to the 
first part of the vocal tract, from the 


lips to the vocal cords. 

You can view the LPC filter as a suc- 
cession of 10 acoustic tubes that rep- 
resent the vocal tract. As the vocal tract 
is warped along the speech signal, the 
corresponding tubes are modified in 
length and diameter, furnishing new 
values for the LPC coefficients. (Be- 
cause those coefficients are sensitive 
to quantization errors, the algorithm 
uses the LSP [linear-spectrum-pairs] 
coefficients, which are less sensitive to 
these types of errors.) 

The second part of speech—the vi- 
bration of the vocal chords—is char- 
acterized by frequency, or pitch. The 
LTP (long-term prediction) analysis 
furnishes a value related to the pitch of 
the input signal. 

The third part of a speech signal rep- 
resents the excitation of the signal (i.e., 
the air coming out of the lungs). De- 
termining the spectral shape of the ex- 
citation is important if you want to keep 
the natural quality of the human voice 
and avoid the metallic effect of digi- 
tal-speech playback. To solve this prob- 
lem, the algorithm determines the best 
possible excitation candidate for the 
excitation signal from among the ref- 
erence signals. These references can 
be prefixed and stored in a dictionary or 
codebook, or they can evolve dynami- 
cally with the signal, as is done in the 
LHS SBCELP algorithm. 
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If you sample a signal below the Ny- 
quist rate (sub-Nyquist sampling), you get 
aliasing: The sample points do not con- 
tain enough information to reconstruct the 
original signal. Aliasing causes frequen- 
cies in the input signal above the Nyquist 
frequency to generate undesirable fre- 
quencies in the digital signal. These fre- 
quencies form a mirror image around the 
Nyquist frequency. For example, if there is 
a 22-kHz signal in the audio before it is 
sampled at 40 kHz, the digital signal will 
contain an 18-kHz signal but not the 22- 
kHz signal. To ensure that aliasing does 
not occur, you must filter the signal to re- 
move any components above the Nyquist 
frequency before it’s converted to the dig- 
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ital domain. This type of filter is called a 
low-pass filter because it passes all sig- 
nals below a specified frequency. 

You can see aliasing at work in a movie 
whenever the 24-frame-per-second sam- 
pling rate is too low to capture rapid mo- 
tion. A well-known result is the effect of 
wagon wheels appearing to turn backward. 
In effect, there is insufficient information 
for the human eye to properly reconstruct 
the original signal. 


Signal Reconstruction 

Once a signal is digitized and processed, 
you often want to return it to the analog 
world so that it can be reconverted to its 
original form. This can take place in real 


time, or it can be delayed. Playing a CD is 
an example of a delayed reconstruction of 
a digital signal. 

A raw digital signal that has been passed 
through a D/A converter would normally 


- be unsuitable for direct use, because the 


converted signal is a staircase function fol- 
lowing the path of the original signal and 
contains many additional signals above 
the Nyquist frequency (see figure 4). Ac- 
cording to the Nyquist theorem, you can 
use a perfect filter to reconstruct the orig- 
inal signal from the staircase generated by 
the D/A converter. A perfect filter passes 
all frequencies below the Nyquist fre- 
quency and blocks any signal above the 
Nyquist frequency. Such a filter has a pass- 
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To carry out this three-part coding 
on a DSP, you must first sample the 
analog speech signal at a frequency that 
varies according to the application. The 
speech quality offered by the telephone 
network is usually satisfactory for voice 
mail, answering machines, and voice 
annotations. In these cases, the sam- 
pling frequency chosen is usually 8000 
Hz. Each sample can be represented by 
8, 12, or 16 bits, which fixes the amount 
of memory needed to store a second of 
speech signal (64,000 bits, or 8 KB, in 
the first case). The digitized input signal 
is divided into successive frames of 15- 
to 40-millisecond duration, depending 
on the chosen final bit rate (from 2000 
to 10,000 bps). The system performs 
an STP analysis. The corresponding 10 
LSP coefficients are then quantified into 
a 24- or 32-bit number, depending, once 
again, on the bit rate. These bits are the 
code for the STP analysis. 

At this point, each frame is divided 
into two, three, or four subframes for 
the computation of the LTP analysis 
and the dictionary search. The number 
of subframes is determined by the final 
bit rate. This allows you to deal with 
short frames (of about 5 ms) to keep 
the values for the pitch and the dictio- 
nary as precise as possible all along the 
speech signal. This is necessary because 
of the continuously changing shape of 
the excitation and of the vocal cord’s 


band from 0 to the Nyquist frequency, a 
zero-width transition band, and a stop band 
from the Nyquist frequency to infinity. A 
frequency-domain plot of the perfect filter 
is shown in figure 5. 

The process of filtering the output signal 
is called convolution. You multiply the 
time-domain representation of the charac- 
teristics of the frequency-domain filter by 
the current sample, n previous samples, 
and n future samples for each output sam- 
ple. These multiplications are called filter 
taps. A filter that requires 10 multiplication 
operations is referred to as a 10-tap filter. 

Because you must use future samples 
to calculate the reconstructed signal, the 
filter processor must wait until these sam- 


vibration frequency. 

Obviously, there is a trade-off be- 
tween the speech quality and the num- 
ber of bits used to code the LTP analy- 
sis and the dictionary candidates. The 
number of bits allocated to each feature, 
linked to the size of the frame and the 
number of subframes, determines the 
bit rate. The existing values are 2400, 
4000, 4800, 7200, and 9600 bps. How- 
ever, any bit rate between 2000 and 
10,000 bps can be adopted after some 
fine-tuning of the algorithms. Enhanced 
perceptual- and dynamic-filtering tech- 
niques enable the algorithm to keep 
good speech quality, even for bit rates as 
low as 4800 bps. 

Coded speech is stored in 8-bit 
chunks. The decoding process enables 
the reproduction of coded speech in real 
time. The coding algorithm needs about 
12 MIPS of computational power; de- 
coding requires 1.5 MIPS. Numerous 
low-cost DSPs are available that can 
perform these tasks. Such processors 
will enable a new generation of speech 
applications. 


Georges Zanellato is the R&D man- 
ager of speech and music coding at 
Lernout & Hauspie Speech Products. 
Bart Verhaeghe is the manager of the 
company’s U.S. marketing operations. 
You can contact them on BIX c/o “ed- 
itors.” 


ples are available. This causes a time delay 
in the filter of n samples. 

The time-domain representation of the 
perfect lowpass filter is the synchronization 
function (sin x)/(x). Unfortunately, this 
function extends to infinity in both direc- 
tions, so the convolution computation must 
include an infinite number of multiplica- 
tion operations and an infinite delay. So, 
you need a way of reducing the value of n. 

In the real world, you reduce n with a 
filter that has a transition band of significant 
width. This is done by windowing the syn- 
chronization function to limit its nonzero 
value range and, therefore, the number of 
calculations per sample to an acceptable 
level. This technique also reduces the pass- 


band to below an optimal level. However, 
this can be compensated for by using a 
higher sample rate than the minimum Ny- 
quist rate. This is why the CD sample rate 
is 44.1 kHz. The 22.05-kHz Nyquist rate is 
beyond the human hearing range, but the 
extra bandwidth allows a low-cost sam- 
pling and reconstruction filter to begin 
rolling off into the transition band at around 
18 to 20 kHz, as shown in figure 6. 


Digital Filtering 

There are many forms of filters in signal 
processing; they are classified according to 
the function they perform. For example, 
lowpass filters pass low frequencies while 
attenuating the higher frequencies. High- 
pass filters pass high frequencies and at- 
tenuate lower frequencies. Bandpass fil- 
ters pass frequencies in a range, or band, 
while attenuating frequencies outside the 
band. 

Filters are also classified according to 
the way they are implemented. One com- 
mon implementation uses both the input 
and output samples to calculate the filtered 
output signal. Because you feed back the 
past output samples of the filter to com- 
pute the current output sample, you con- 
tinuously recycle energy within the filter. 
This means that the response of the filter to 
an impulse (or spike) is infinite in length. 
This type of filter is called an //R (infinite 
impulse response) filter. 

IIR filters are often used because of their 
ability to create sharper transitions with 
little computation. However, one of the 
desirable attributes of a filter, called linear 
phase, is missing in an IIR filter. Linear 
phase refers to the characteristic where all 
frequency components of the original sig- 
nal are delayed by the same number of 
samples before they arrive at the output. 

Another common filter implementation 
called an FIR (finite impulse response) fil- 
ter uses only its input samples to calculate 
the filtered output signal. In this case, an 
impulse applied to the filter will die out 
after n samples, where n is the number of 
taps in the filter. 

The advantage of FIR filters is that they 
are linear phase. Unfortunately, more com- 
putation is required to achieve the desired 
sharp transitions with this filter design. 


Digital Storage 
and Real-Time Processing 
Digital signals can be stored on hard disks 
for editing and playback. Although this is 
an obvious use of these signals, only re- 
cently have desktop computers had enough 
storage and processing speed to make this 
possible. 

Processing-speed requirements can be 
divided into two major categories: real-time 
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Figure 4: Digital signals converted directly to analog show a staircase effect 
and in some cases produce signals above the Nyquist frequency. 


THE PERFECT FILTER 


1.0 |_ Passband 


Amplitude 


Zero-width transition band 


Stopband 


2f 


Frequency 


Figure 5: Viewed from the frequency domain, a perfect filter stops all frequencies 


above f, the Nyquist frequency. 
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Figure 6: To keep processing needs reasonable, a real filter for CD playback uses 
a windowed synchronization function and depends on a sampling rate that is more 


than two times the Nyquist frequency. 


processing and non-real-time processing 
(which is also called background, or time- 
share, processing). Real-time processing 
occurs when the process can accept or pro- 
duce sampled data at the same rate as the 
conversion hardware. Furthermore, real- 
time processing implies a reasonable guar- 
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antee that the process will not be inter- 
rupted or late, so a continuous flow of data 
can be supported between the process and 
the conversion hardware. 

Non-real-time processing is used to pro- 
cess data over a period of many hours or 
days. The results are stored in memory or 


on-disk and are analyzed or played back in 
real time. This type of processing is useful 
when insufficient processing is available to 
handle the desired function in real time. 

Consider the processing required to han- 
dle stereo CD audio in real time. There 
are 44,100 samples per second per channel 
that must be processed. If the desired pro- 
cessing is a filter with 40 taps, each of 
which requires one multiplication opera- 
tion and one addition operation (multiply- 
accumulate, or MAC), 7.056 million op- 
erations per second are required (two 
channels x 44,100 samples x 40 taps x 
two operations). This processing load can 
be handled by most modern-day DSP 
chips. However, a standard RISC or CISC 
processor would require many times this 
number of instructions per second to do 
this type of calculation, because separate 
instructions are required to access, load, 
and store the data. To process a similar 
video filter in real time, over a billion cal- 
culations per second are required. This is 
the reason most video signal processing 
is done with custom chips. 

In addition to requiring a great deal of 
processing power, these digital signals take 
lots of space. For example, an hour of CD- 
quality music takes over 600 MB of storage, 
and a minute of video takes over 500 MB. 


Data Compression 

One of the most popular signal-process- 
ing functions is data compression. Signals 
can be compressed, or coded, to reduce 
their large storage requirements. There are 
two basic types of compression: lossless 
and /ossy. These terms refer to the effect 
the compression algorithm has on the in- 
formation in the original signal. 

Lossless compression is used when a 
reconstructed signal must be the same as 
the original signal. A common example 
of lossless compression is disk file com- 
pression. Lossless-compression algorithms 
can usually compress digital data up to 
one-half to one-quarter of its original size. 
Some common lossless-compression al- 
gorithms are Huffman and Lempel-Ziv. 

Lossy compression algorithms are used 
when the reconstructed signal does not 
have to be identical to the original signal. 
This is the case with audio, video, and im- 
age compression. These data types often 
have more information than can be per- 
ceived by the human receiver. Thus, the 
compression algorithm can afford to lose 
information. 

Lossy compression can frequently com- 
press digital data to as much as one-tenth 
to one-hundredth or more of its original 
size, depending on how much computation 
can be expended and the desired quality of 
the reconstructed signal. Lossy algorithms 
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include ADPCM (adaptive differential 
pulse code modulation), CD-XA (Com- 
pact Disk Extended Architecture), and 
subband for audio; JPEG (Joint Photo- 
graphic Experts Group) for images; and 
MPEG (Moving Pictures Experts Group) 
for both audio and video data. 

Nonuniform quantization is also used 
for compressing signals. One of the most 
widespread techniques is called vector 
quantization. Instead of the signal being 
stored one sample at a time, a token rep- 
resenting an entire set of samples (called a 
vector) is stored. For example, if the se- 
quence “‘one, two, three, four, five” occurs 
often in a signal, a token can be used to 
represent the sequence. This technique de- 
rives the correct set of tokens at the com- 
pression end and requires an enormous 
number of calculations. Decompression is 
fast, however, and requires only a simple 
table lookup. This allows the use of low- 
cost playback equipment to decompress 
the signal. 


Sample-Rate Converters 

It is often useful to convert a signal from 
one sample rate to another. This is gen- 
erally required when passing a signal from 
one system to another. For instance, a sig- 
nal recorded at 48 kHz on a professional 
digital tape deck may have to be convert- 
ed to 44.1 kHz for storage on a CD. An- 
other example of sample-rate conversion 
is when a signal is passed between an au- 
dio system and a telephone system. Each 
of these systems has a different sample 
rate, selected for optimum utility for a giv- 
en function. Digital telephone systems 
typically use an 8-kHz sample rate, and 
digital audio systems usually use 44.1 kHz 
or 48 kHz. 

Sample-rate converters are of two types: 
up converters and down converters. The 
up converter generates more output sam- 
ples than input samples; the down con- 
verter does exactly the opposite. In either 
case, the computational process takes the 
basic form of a digital filter that removes 
aliases and unwanted out-of-band arti- 
facts. 


Adaptive Filtering 
There are numerous cases where simple 
filtering is not effective or where the cost 
of the filter is too high for an application. 
In these cases, adaptive filtering is often 
used. An adaptive filter adjusts its param- 
eters based on the content of the signal. 
In fact, adaptive processes can select from 
a set of possible filters, depending on the 
signal. 

An interesting example of this is the 
CD-XA compression algorithm. In this 
technique, audio is broken up into blocks 
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of 28 samples. Depending on the content of 
each block of audio, one of four different 
filters is selected that will best match the 
original signal during decompression. This 
takes substantial processing during the 
compression stage—each block must be 
compressed four different ways and then 
decompressed and compared to the origi- 
nal signal. The best match is selected, and 
the coded form becomes part of the com- 
pressed data stream. The filter selection is 
included in the data stream to allow the 


SPs will change 
the way you interact 
with your computer. 


decoder to use the correct filter for that 
block. 

The advantages of this approach are 
that the highest computation is required 
during encoding and that simple filters 
can be used during decoding. This is de- 
sirable because compression usually hap- 
pens once at the production facility and 
playback can occur hundreds of times by 
many people in different locations. Simi- 
lar operations can be performed with adap- 
tive filters in noise-canceling and noise-re- 
duction applications. 


Future Signals 

As digital signal processing becomes wide- 
spread and processing power increases, 
more focus will be placed on functions for 
personal computers and digital assistants 
that until recently were only dreams. Dig- 
ital signal processing makes it possible for 
your computer to use multimedia infor- 
mation in real time. 

You can look forward to a rapid prolif- 
eration of amazing new capabilities over 
the coming years based on the marriage 
of DSPs and standard CPUs. From speech 
recognition to real-time digital video, DSPs 
will change the way you interact with your 
computer. ll 


Eric C. Anderson is manager of the Sound 
& Signal Processing Group within the Ad- 
vanced Technology Group of Apple Com- 
puter (Cupertino, CA). Stephen Shepard 
and Phil Sohn are members of the group. 
You can contact them on BIX c/o “edi- 
tors” or on AppleLink as “anderson13.” 
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The integration of DSP technology on the desktop is already under way 


JOHN BRYAN 


he integration of DSPs (digital sig- 

nal processors) on the desktop, ei- 

ther as an add-in board or as a part 

of the motherboard, brings technol- 
ogies such as continuous speech recogni- 
tion closer to everyday reality. And pro- 
grammable, powerful signal processors 
are being used for various other new ap- 
plications. 

DSP technology has been available for 
quite awhile, but until now, only special- 
ized applications (e.g., disk head position- 
ing, spectrum analysis, dedicated audio and 
video processing, and PBX systems) have 
migrated to personal computers. With re- 
cent advances in hardware, firmware, and 
software, signal processing is moving into 
desktop applications that will bring excite- 
ment to business computing. 


The DSP Difference 

What is the difference between a standard 
CPU and a DSP? There are architectural 
differences, certainly (see “Inside Signal 
Computing” on page 177), but the funda- 
mental difference lies in the ability of the 
DSP to handle real-time data streams gen- 
erated by sampling analog data patterns. 
By their nature, signals are constantly 
changing. If a computer is unable to act 
on the data as it happens, the computa- 
tional results, if any, will be invalid. So, 
signal processors must be able to quickly 
interpret and react to data and perform the 
necessary calculations, such as multiply 
and accumulate. 

One of the main advantages of inte- 
grating a DSP with a standard CPU is that 
such an arrangement can provide concur- 
rence of signal-processing operations with 
respect to general computing tasks. A DSP 
isn’t inherently any faster than a similarly 
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clocked CPU, but DSPs excel at particular 
functions. The relationship between a DSP 
and a CPU in a desktop system is analo- 
gous to that of a math coprocessor and a 
main processor. A fixed-point 386 can do 
all the floating-point calculations that the 
387 would normally handle, but the 387 
is a lot faster. The same is true for DSPs. 
For example, for the types of calculations 
that call for signal processors, Texas In- 
struments claims that its TMS320C 16-bit 
fixed-point DSP can deliver three to five 
times the MIPS of a 386 CPU. 


DSP Data 

One of the most important facets in deter- 
mining how to implement a DSP applica- 
tion is ascertaining the sampling rate for 
data. Speech—at least at the quality you 
hear over the telephone—is one of the less 
demanding DSP applications from a pro- 
cessing point of view. A microphone, 
which is a transducer that converts sound 
waves into voltage levels, is the most com- 
mon data source. The data flows into an 
A/D converter, which produces samples 
of the data 8 bits wide at the rate of 8 KHz, 
or 8000 times a second. A DSP takes this 
data stream and performs whatever calcu- 
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lations the software calls for. The output 
goes to a D/A converter and then to a 
speaker, which turns the electrical signal 
back into sound. If this data goes to disk, it 
takes up about 8 KB of disk space per sec- 
ond of speech. 


RTA isa 
multiprocessing 
system that supports 
multiple DSPs. 


An 8-KB-per-second sampling rate is 
about as low as you can go and still get 
decent sound quality. CD-quality sound 
must be sampled at a faster rate (up to 44 
KHz), and the word size of the sound bytes 
should be 16, 24, or 32 bits. Storage re- 


quirements scale accordingly, with typi- 
cal high-quality stereo sound taking up to 
176 KB per second of sound. 

Sound is a lightweight in the consump- 
tion of storage capacity. Real-time video 
can require as much as | MB per second, 
which quickly fills up a 40-MB hard drive. 
In fact, one of the primary uses of the DSP 
in an application is the compression/de- 
compression of the data stream as it moves 
onto and off of the disk. 

Given that DSPs are adept at handling 
speech and video data, it’s not surprising 
that the prime motivation for using DSPs 
in personal computers is multimedia ap- 
plications. In fact, without DSPs, true mul- 
timedia would remain a pipe dream, be- 
cause general-purpose CPUs just don’t 
have the horsepower to handle multimedia 
data effectively. 


DSP on the Desktop 

Next (Redwood City, CA) was the first 
major system manufacturer to recognize 
the value of bringing DSP technology to 
the desktop. It has included DSP hardware 
and the necessary operating-system sup- 
port in every workstation it has produced. 
Next uses the Motorola 56001, a 24-bit 
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processor with expandable local RAM, to 
support the multimedia efforts of their 
ISVs (independent software vendors). 
Next’s object-oriented NextStep operat- 
ing system also includes objects for audio 
and video data manipulation, ISDN tele- 
phony, CD sound (you can listen to your 
favorite music while you’re computing), 
and other functions. 

Another system manufacturer with a 
commitment to using DSPs is Apple (Cu- 
pertino, CA). Apple has always empha- 
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sized the value of quality sound in a com- 
puting environment, but it has only re- 
cently announced its intention to integrate 
a full-fledged programmable DSP into the 
Mac platform. Apple started the project in 
1987, and after trying synthesis chips, 
phase-locked loops, and static-program 
DSPs, the company finally decided that it 
needed a fully clean, 32-bit, big-endian, 
byte-addressing processor (e.g., the 68030 
and the 68040). 

Apple has teamed up with another in- 


dustry giant, AT&T, to produce ARTA 
(Apple Real-Time Architecture), a real- 
time multitasking and multiprocessing sig- 
nal-processing extension for the Mac. The 
goal of this DSP architecture is to provide 
a scalable standard platform for most types 
of signal processing, including speech, 
sound communications, image processing, 
and music. 

ARTA features the AT&T DSP3210 
processor, which is a fully programmable 
32-bit DSP with on-board cache and a 32- 
bit bus to local static RAM or to page- 
mode DRAM. The DSP3210 is capable 
of clock rates of up to 66.6 MHz. ARTA’s 
kernel is only 512 words (2048 bytes) and 
takes up one-quarter of the DSP3210’s on- 
chip memory. 


ee | 
DSP desktop 
applications 


e data compression/decompression 
e data communications 

e speech recognition 

e speech synthesis 

e sound synthesis 

e image processing 


The platform’s software component is 
composed of two parts. The host portion 
takes care of management functions, and 
the DSP portion performs real-time data- 
stream processing. 

ARTA is actually a dual API system. 
In System 7.0, developers work with the 
API Toolboxes, which use drivers that link 
to the hardware of the Mac. For DSP ap- 
plications, there is the DSP Module (a 
toolbox equivalent), which links the DSP 
kernel to the DSP hardware. With this dual- 
API system, the DSP programmer can 
write code without knowing or using any 
Mac code, and the Mac application devel- 
oper can produce software that takes ad- 
vantage of the DSP without knowing or 
using any DSP-specific code. 

ARTA is a multiprocessing system that 
supports multiple DSPs. Apple will supply 
DSPs only as a part of the motherboard, 
not as NuBus cards, although it will li- 
cense ARTA to NuBus developers. Ap- 
plications developed under license from 
Apple will operate seamlessly within the 
ARTA environment. And Mac systems 
with integrated DSPs will be available next 
year. 

Apple has an array of uses planned for 
ARTA. Besides digital audio functions 
(e.g., compression, noise reduction, and 
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Photo 1: MWave is a collaborative effort by IBM, Texas Instruments, and 
Intermetrics to produce a DSP board for the PC. The heart of MWave is TI’s 
TMS320M500 DSP chip, which delivers 17-MIPS performance in a 16-bit data 


fixed-point package. 


mixing), Apple plans to promote the de- 
velopment of speech and communications 
programs. Speaker-independent speech 
synthesis and speech recognition are an 
exciting step toward creating the first tru- 
ly user-friendly human-computer inter- 
face. With this feature, a computer could 
tell a novice user, in English or in any oth- 
er language, exactly how to set up the sys- 
tem to best suit the environment and pro- 
posed uses. Other possible applications 
are voice-edited documents, video tele- 
phones, video- and audio-enhanced soft- 
ware installation, and presentation and ed- 
ucation software. 


Twin Peaks 

In addition to working with Apple on the 
Mac platform, AT&T has also come up 
with a DSP solution for MS-DOS com- 
puters. VCOS (Visible Caching Operating 
System), AT&T’s operating system for the 
DSP3210, is multitasking and resides in 
the memory local to the signal processor. 
Developers can use the VCOS and VCAS 
(Visible Caching Application Server) mod- 
ules to integrate AT&T’s DSP technolo- 
gy into general-purpose computing sys- 
tems. And VCOS relieves applications 
programmers and system integrators from 
having to deal with the complexities of 
DSP programming. 

Not to be outdone, IBM (Armonk, NY) 
has also announced its intentions to get 
into the personal computer DSP market. 
IBM has formed an alliance with Texas 
Instruments and Intermetrics, a software 
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development company, to bring out a prod- 
uct called the MWave (see photo 1). IBM 
will first be producing a plug-in board for 
the ISA or Micro Channel architecture bus, 
but it also has plans to produce PS/2 sys- 
tems with DSP technology on the moth- 
erboard by mid-1993. 

TI engineered the DSP chip used in the 
MWave. The TMS320MS00 delivers 17- 
MIPS performance in a 16-bit data fixed- 
point package. The processor has seven 
lines for serial data input and a bus data 
line (which is host-specific) and multi- 
channel DMA for all I/O. Although the 
chip deals with 16-bit data, the program 
memory bus is 24 bits wide. The DSP is in- 
tegrated into a board that includes a MIDI 
port, a UART (universal asynchronous re- 
ceiver/transmitter), stereo A/D convert- 
ers, and telephony AIC interfaces. 

TI will develop an OEM distribution 
channel for the MWave, offering it to sys- 
tems manufacturers for integration into 
their motherboards. At this fall’s COM- 
DEX, TI was scheduled to announce a DDK 
(Driver Development Kit) with sample 
drivers for speech, audio, and telephony. 

Operating-system support is provided 
by IBM, whose Burlington, Vermont, 
product group developed a multitasking 
operating system for the MWave project. 
This embedded operating system, the 
MWave DSP manager, sits on top of OS/2 
or Windows and can handle functions like 
JPEG (Joint Photographic Experts Group) 
video compression, voice recognition, data 
and fax modems, echo cancellation, music, 


and text-to-speech conversion. 

The MWave DSP manager is a virtual 
device driver that provides a high-level 
API for digital signal processing in either 
environment. This API is the platform that 
provides a socket for device drivers. IBM 
wants to use this technology to increase 
desktop functionality—integrating the fax, 
telephone, dictation machine, and other 
office appliances into the PC. 

IBM’s objective is for this product to 
become as pervasive as the math copro- 
cessor. For this to occur, application de- 
velopment will have to proceed at a pace 
with the development of support hardware. 
To further this end, IBM is out to enlist 
the support of major software develop- 
ment houses, such as Microsoft, Borland, 
Lotus, and WordPerfect. 

To support the creation of all these ap- 
plications, Intermetrics was tagged to come 
up with the development tools for the pro- 
grammer. What it’s providing is a stan- 
dard ANSI C software development kit, 
complete with language, compiler, assem- 
bler, debugger, and a set of programming 
tools that are generic to the world of C 
programmers. A provider of DSP appli- 
cations for the space-shuttle program, In- 
termetrics has been in the business of de- 
veloping system application software for 
embedded systems for 23 years (until re- 
cently, most DSP applications were im- 
plemented in embedded systems). 

One of the more helpful tools in the 
MWave environment is a nice visual de- 
bugger that enables you to trap signals 
coming in to the DSP in real time and ob- 
serve their interaction with the host appli- 
cation. Intermetrics will provide one set 
of tools for Windows and another for OS/2. 


DSP and Communications 

Besides the major systems vendors, oth- 
er companies, of both hardware and soft- 
ware orientation, are in the desktop DSP 
market. Many of them, especially the soft- 
ware firms, create products for Next work- 
stations, mainly because Next has had in- 
tegrated DSP support longer than anyone 
else. But many vendors have produced 
hardware/software solutions to specific 
vertical markets (e.g., radar research or 
digital instrumentation) for both the PC 
and the Mac platforms, and many more 
are moving in this direction, as DSP tech- 
nology becomes more of a standard than a 
standout. 

This year, Hayes Microcomputer Prod- 
ucts (Atlanta, GA) announced the Hayes 
ISDN Extender, a network-interface mod- 
ule that provides ISDN Basic Rate Access 
and analog telephone-line connectivity to 
Next computers. The ISDN Extender can 
be used for remote LAN connections and 
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Photo 2: The Lightning Effects | Macintosh accelerator from Spectral Innovations 
targets specific applications. This product provides enhanced performance for 


image-processing applications. 


high-speed digitized voice, data, and fax 
modem, as well as other multimedia func- 
tions (e.g., video transmission). 

Ariel (Highland Park, NJ) is another 
vendor that has concentrated its efforts in 
the Next market, for which it makes a wide 
range of products, from the $500 Digital 
Microphone to the $15,000 IRCAM signal- 
processing workstation. Even though the 
Digital Microphone, ProPort, and DAT- 
Port all deal specifically with CD-quality 
sound and use the Nextstation’s own Mo- 
torola DSP56001 DSP, the IRCAM and 
the QuintProcessor each feature their own 
DSPs. The QuintProcessor contains five 
27-MHz 56001 DSPs, four of which han- 
dle DSP functions while the fifth manages 
on-board memory, storage, and interpro- 
cessor communication. The IRCAM uses 
two Intel 860 RISC processors to provide 
a parallel-processing environment, with a 
56001 DSP for data I/O. The IRCAM also 
comes with its own operating system, 
CPOS. 

Metaresearch (Portland, OR) is a soft- 
ware firm whose products Digital Ears, 
SoundWorks, and Color Digital Eye can be 
used creatively in multimedia presenta- 
tions. SoundWorks is essentially a sound 
mastering board, a digital version of the 
professional mixing board that you might 
find in any recording studio. Digital Ears is 


a stereo digitizer that captures CD-quality 
sound for the Next. And Color Digital Eye 
is a video frame grabber for entering and 
editing video images. 

Another company that concentrates on 
sound, music, and professional recording 
is Digidesign (Menlo Park, CA). Digi- 
design has been producing products for 
the Mac for three and a half years, although 
it does not target the Mac user as much 
as the recording engineer or broadcast 
professional. Its three products (Audio- 
media II, Sound Tools II, and Pro Tools) 
combine the Motorola 56001 with high- 
end software to do stereo or multitrack 
recording and mixing functions (e.g., com- 
pression, waveform editing, equalization, 
chorusing, echo, and pitch shifting). They 
can also do SMPTE (Society of Motion 
Picture and Televison Engineers) syn- 
chronization. 


Processing Pictures 

Giga Operations (Berkeley, CA) is a start- 
up company whose aim is to develop a 
low-cost, massively parallel digital signal- 
processing board for desktop computer 
systems. GigaOps uses the Analog De- 
vices 2105 DSP, a 16-bit processor rated 
at 10 MIPS. Giga Operations puts four 
2105s, 1 MB of DRAM, and a Xilinx PGA 
(Programmable Gate Array) into a single 
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surface-mount module called a SIMOD 
(Scalable Intelligent Image Module). Each 
of the SIIMODs provides 40 MIPS of sig- 
nal-processing power. The end product, 
an ISA bus card called the T-800, supports 
up to eight of these modules, for a total 
power throughput of 320 MIPS, with 32 
DSPs and 8 MB of RAM. With four cards 
in one system, the total processing power 
becomes 1280 MIPS, hence the company 
name. 

Giga Operations is targeting the image- 
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processing market. Its long-term goal is 
the execution of real-time math-intensive 
image processing at the read/write rate of 
a fast hard drive. To do this, it provides a 
proprietary C compiler called the Stream 
Splitter, which takes the serial data stream 
of signals and converts them for operation 
in parallel mode. One of the slick features 
of the T-800 is that the PGAs allow the 
user code to dynamically configure the 
board in real time (on the order of mi- 
croseconds). Thus, alternative virtual ma- 


chines can in parallel take advantage of 
resources as they are required. 

Spectral Innovations (Santa Clara, CA) 
has been making DSP accelerator cards for 
the Mac since 1988. Like ARTA for the 
Mac, its cards use AT&T signal proces- 
sors. But unlike Apple, the company makes 
a NuBus card with separate software mod- 
ules for a variety of DSP functions (see 
photo 2). 

In the past, Spectral Innovations focused 
its attention on the technical marketplace 
(e.g., Signal analysis), but now it’s in the 
process of producing more mainstream ap- 
plication modules. It intends to announce 
a fax/modem/telephony module by the 
end of the year, and it has a number of 
other projects in the works. The company 
provides a development environment with 
each card, and other vendors have made 
modules that use their hardware to accel- 
erate Adobe Photoshop, LabView, IPLab, 
and MatLab. 

From a consumer’s point of view, one of 
the terrific things about the integration of 
DSP technology onto the desktop is that 
each function—whether it is audio, video, 
modem/fax, or some other tool—essen- 
tially exists as a virtual machine. The un- 
derlying hardware does not change while 
the application software creates the prod- 
uct. Besides keeping the cost down by re- 
ducing the number of pieces of hardware 
you must buy to accomplish various tasks, 
this could also reduce the size of the host 
system, especially as DSPs are integrated 
onto the motherboard. This also simpli- 
fies the upgrade process, because vendors 
need only send another disk to fix bugs or 
provide new functions. 


Forging Ahead 

CPUs are not up to the task of working 
with audio and video data. There are just 
not enough MIPS available. For now, the 
best way to handle multimedia data is to 
add a DSP to your system. You will see 
more system and peripheral vendors 
adding DSPs to their products in the com- 
ing year. 

The logical next step, of course, is the 
integration of a DSP into general-purpose 
microprocessors. With advances in chip 
integration and with the 80x86-architec- 
ture vendors trying to differentiate their 
products from their competition, the ad- 
dition of DSP functionality to an indus- 
try-standard CPU is inevitable. In many 
ways, DSPs are poised to become the math 
coprocessors of the 1990s. @ 


John Bryan is a freelance technology writ- 
er and consultant based in San Jose, Cal- 
ifornia. You can reach him on BIX clo “ed- 
itors.” 
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Unlike general-purpose processors, 


DSPs are designed to perform a limited number of functions quickly 


any cutting-edge desktop comput- 

er applications require the pro- 

cessing of real-world information, 

such as video and speech. Even 
though general-purpose CPUs can perform 
this processing, it is decidedly not their 
forte. The best way to turn an ordinary com- 
puter into a multimedia master is to add a 
digital signal-processing chip. These chips 
provide the ability to create and modify 
complicated video and audio signals in 
real time. That’s why every Next machine 
is sold with a DSP (digital signal proces- 
sor) on board, why IBM and AT&T are 
centering their multimedia offerings on 
DSPs, and why future Macs will come 
equipped with them. 

What do DSPs do that is unique? Noth- 
ing, actually. Standard chips such as the 
486 can do everything that a DSP does— 
just not as fast. Conversely, a DSP can do 
most things that a standard microprocessor 
can, but in most instances, a DSP would be 
much slower than a general-purpose CPU. 
Occasionally, it would even be incapable 
of handling certain problems. 

The secret of the DSP’s success is the 
modification of standard microprocessor 
architectures, which greatly enhances the 
chip’s ability to compute the operations 
common in digital signature processing. 
The canonical signal processing function is 
the weighted sum. This is usually called 
a digital filter, or a vector dot product. 
One simple application of this function is 
noise reduction via smoothing by averag- 
ing the last 7 values of the signal. Most 
signal-processing functions are more com- 
plex, but by providing an architecture 
geared to handling this class of problems, 
DSPs easily outshine general-purpose 
CPUs. continued 
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Architectural Highlights 
DSPs contain special addressing features 
and beefed-up data buses that allow them 
to keep up with the flow of data and com- 
pute signal-processing functions quickly. 
Many general-purpose DSPs are on the 
market, and each of them has a different 
approach to finding the fastest way of mov- 
ing bits in and out. The differences be- 
tween a standard microprocessor and a 
DSP are usually found in four categories: 
instruction sets, addressing modes, inter- 
rupt structures, and structural changes. 
Many of the examples in this article are 
taken from the architecture of the Motorola 
56000 and Analog Devices’ line of signal 
processors, but DSPs made by companies 
such as Weitek or Texas Instruments share 
many of the same features. This article 
concentrates on the architectural themes 
shared by most DSPs. 


Instruction Flux 

The easiest way to get a processor to com- 
pute weighted sums quickly is to add one 
instruction that computes v/*v2+v3 -> v4 
quickly. v3 and v4 are usually the same 
register or a memory location called the 
accumulator, and it holds the partial total 
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SPs use small 
looping programs 
that process large 
streams of data. 


of the weighted sum as it’s calculated term 
by term. v/ and v2 hold the weight and 
the value of the function. A digital filter 
can be computed by stringing together a 
number of these operations. 

Adding this instruction to a processor 
forces you to make changes to standard 
processor architectures. Most DSPs de- 
vote a large section of silicon to a multi- 
plication unit that can multiply v/ by v2 
in one instruction cycle. This unit is often 
pipelined to save silicon space. In contrast, 
early versions of the Sun SPARC proces- 
sors did not have a multiplication instruc- 


tion. The compiler would simulate the mul- 
tiplication out of shifts and additions. This 
points out a major difference between 
DSPs and CPUs: General machines spend 
more time moving information and bits 
around than they do multiplying them; 
DSPs spend their lives doing multiplica- 
tion, so it pays to devote a lot of silicon 
to this feature. 

The basic v/*v2+v3 -> v4 instruction 
takes three values from a register file and 
sends one back. A general DSP could ex- 
ecute the instruction when v/, v2, v3,and 
v4 are different registers, or memory lo- 
cations. This would make it easier for the 
compiler to reduce complex arithmetical 
expressions to machine code. RISC archi- 
tectures often place no restrictions on the 
use of registers for just this purpose. 

The architectural cost of this approach, 
though, is often too high, even in the age of 
3-million-transistor chips. You would need 
three data buses on the chip and extra cir- 
cuitry to handle all the general cases that 
might come up. In almost all cases, how- 
ever, the generality wouldn’t be used by a 
DSP processing filter, which usually in- 
cludes instructions where v3 and v4 are 
the same register. For that reason, many 
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DSPs include a special accumulator reg- 
ister and can process only functions of the 
form v/*v2+ACC->ACC (where ACC is 
the accumulator register). This accumula- 
tor is usually twice as big as a regular reg- 
ister to avoid rounding off the results of 
the multiplication after each step. 

Another important addition to the in- 
struction set of a DSP is the loop counter. 
A general microprocessor must be ready to 
execute While loops, where a block of 
code is executed until a specific test is sat- 
isfied. Loops that execute a set number of 
times are only a fraction of the loops in 
general code for RISC or CISC CPUs. Fil- 
ter functions, however, almost always use 
a set number of passes through the loop. In 
many cases, there is only one multiply- 
and-accumulate instruction in the middle of 
the loop. The extra test-and-branch in- 
struction executed at the end of each pass 
through the loop takes considerable time, 
and the time spent on this can nearly dou- 
ble the execution time of the loop. 

The solution is to add a special counter 
that can be set at the beginning of the loop. 
At each pass through the instructions in 
the body of the loop, the counter is decre- 
mented and compared to zero in parallel. 


This allows the loop to execute as fast as 
the instructions in the body of it because 
the increment, test, and branch instruc- 
tions are handled at the same time the main 
body is executing. The extra circuitry in- 
volved in this loop counter is extensive, 
but it’s worthwhile because DSP applica- 
tions are heavily devoted to tight loops of 
predetermined length. 

Some DSPs, like those from Analog De- 
vices, include special barrel shifters that 
speed computations of functions (e.g., the 
fast Fourier transform). These allow the 
programmer to quickly shift a word of data 
over several bits. 


The DSP difference 


e single-instruction 
multiply-accumulate 
e multiple data buses 


¢ programmer-accessible 
caches 


e specialized interrupt schemes 
¢ loop-optimized addressing 


Address Change 

The architects who design DSPs also look 
at the pattern of memory references to de- 
termine the quickest way to increase the 
throughput of data. The standard address- 
ing mode of a RISC microprocessor is to 
load a value from a direct address. Older 
CISC architectures (e.g., the 80x86 and 
the 680x0) also include indirect addressing 
modes, where a pointer is followed and 
occassionally incremented. These modes 
are usually supported by a DSP. 

However, DSP designers also included 
stranger addressing modes that are imme- 
diately useful for implementing filter func- 
tions on the DSP. In most cases, a DSP 
takes a signal at time ¢ and computes a fil- 
ter function over the previous i-1 values. 
The best way to store these i values is as a 
block of i words of memory. At time f, the 
signal value is stored in word t MOD i (t 
MOD i is what is left over after dividing t 
by i). 

Many DSPs include a modular address- 
ing mode that will look up a value at a lo- 
cation and an offset; increment the offset; 
and if the new offset is greater than the 
size of the buffer, reset the offset to zero. 


It can do all this in one instruction cycle. 
continued 
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Figure 1: The main feature of the 56001 is the two 24-bit data buses (i.e., XDB and YDB) that feed into the ALU. The ALU can 
multiply the values on these buses and add the result to the 56-bit accumulator in one operation, speeding up many signal- 
processing operations. The 56001 also features large on-chip instruction and data caches. The separate address-generation 
unit saves the ALU from having to calculate the address of the next data item. 


This work is handled by a separate ALU 
for computing the addresses. RISC sys- 
tems, in contrast, have only one addressing 
mode to remove the need for the extra 
ALU, and processing a circular buffer 
takes many extra cycles. Here is the string 
of instructions that would handle this for a 
RISC processor: 


t <- t+1; increment time 

rl <—- t mod i 

r2 <- base+rl; add offset to 
base 

store value in r2; store it 
away 


Another popular but seemingly strange 
addressing mode of DSPs is to reverse the 
bits. For example, an address such as 18 
(1010 in binary numerals) is interpreted 
as 5 (0101 in binary numerals) —in a chip, 
the addresses take up the full word: 32 
bits. This simple flip makes programming 
fast Fourier series expansions quicker— 
often as much as 10 times faster than on a 
similar RISC chip with the same cycle 
time and MIPS rating. It should be easy 
to see why when you imagine trying to re- 
verse the bits in a word using standard 
RISC operations. 
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Double-Decker Buses 

Getting information on and off the chip is 
a problem for any microprocessor design- 
er, but DSP architects have made changes 
to standard processor design that have 
tuned these chips for high-speed data trans- 
fer. The most obvious change is splitting 
the processor/memory interface into an 
instruction stream and a data stream. This 
is an easy modification to make because 
DSPs often use small looping programs 
that process large streams of data. This al- 
lows the programmer and the processor to 
optimize the use of both of these paths. 

Many DSPs from Motorola, Analog De- 
vices, and other companies take this one 
step further. They have two data buses that 
grab data from the main memory simul- 
taneously (see figure 1), which lets the 
chip read the two operands to be multi- 
plied in the weighted sum in one step. This 
significantly increases the speed of the 
DSP because it reduces the bottleneck be- 
tween memory and the processor. 

DSPs don’t go the next logical step (i.e., 
adding a third bus to write the data) be- 
cause most filter functions take many in- 
puts for each output. Not as much infor- 
mation flows in the other direction. 

A traditional microprocessor (see fig- 


ure 2) has a cache that lies between the 
chip and the main memory. This cache 
keeps a copy of the last n memory items 
that were referenced by the processor. 
Thus, it’s able to supply these items to the 
processor faster than the main memory 
system can. 

Caches work on the principle that much 
of the data that is touched by the processor 
is often reread a short time afterward. 
DSPs, on the other hand, have different 
access patterns. Most data comes into the 
chip once, and the result computed from 
the data leaves immediately afterward. 
When the data is reused, it’s often done 
in a predictable way that can be exploited 
by the programmer. 

Smoothing filters that use circular ar- 
rays, for instance, look only at the last i 
values of the function. Caches could keep 
track of these values, but it’s better to leave 
this functionality off the chip because the 
circuitry required to determine the oldest 
values in the cache takes up silicon and 
adds a delay to the data bus. 

This is worth the trouble in a general 
chip, where the complex data-access pat- 
terns would not be easily anticipated by 
the programmer. With DSPs, however, 
speed is so important that the optimization 
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Figure 2: Like other general-purpose processors, the 486 is optimized to keep irregular blocks of data moving on and off the chip. 
The cache keeps copies of the most recently accessed data, but there is no facility for the programmer to designate what 
information is to be kept on-chip. The ALU is split into FPUs and integer units because the chip must handle both types of 
arithmetic, but the ALU doesn’t have an accumulator or a fast multiplier to speed up multiplication and accumulate functions. 


of performance in tight loops and other 
areas that normally fall to a cache is han- 
dled by the programmer. 

DSPs often provide a small amount of 
local memory. For example, the Motorola 
96000 has two banks of 512 32-bit words— 
one for each incoming data bus. The pro- 
grammer can access each of these banks 
directly and arrange the access pattern of 
the program to keep the necessary data on 
the chip. Someone calculating a smoothing 
function of the last i values would keep 
the circular buffer in this memory space. A 
program that did not reuse the data, though, 
wouldn’t use this special cache. 

The instruction stream is handled in 
much the same way for similar reasons. 
The chips often provide a small amount 
of on-chip memory to hold small loops, 
and it would be possible to include the 
cache hardware to do this automatically. 
But that takes circuitry, and a cache cannot 
do the job as well as a programmer. 

Here’s one obvious case. Imagine that a 
program spends most of its time in a small 
loop that adds reverb to a guitar signal. 
After every million times through the loop, 
the programmer/composer slightly tweaks 
the weights used in the filter functions. 
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When it goes to do this tweaking, a cache 
would dutifully load the recalibration code 
on top of the loop. This would slow the 
system down when it returned to the loop. 
A programmer, however, would be able 
to properly allocate the small on-chip in- 
struction memory to avoid this delay. In 
most cases, programs that run on DSPs 
have a simple enough structure that pro- 
grammers can easily predict the pattern of 
instructions and subroutines. 


Making Connections 
Many DSPs also include several ports for 
communicating with other chips. Both Ana- 
log Devices and Motorola’s DSP chips have 
two serial ports for exchanging data with 
modem chips, A/D converters, and other 
DSPs. These two lines allow the DSP to 
maintain its own connection with the out- 
side world without bothering the main 
CPU. It can get a signal from a modem and 
interpret it, notifying the main CPU only 
when the data is ready for consumption. 
In the most high-end signal-processing 
implementations, several DSPs are linked 
in a long chain. These arrangements can do 
many different calculations, including com- 
plicated matrix operations. But in most 


cases, each DSP is responsible for its own 
filter function, and the result of one DSP is 
fed into another. 


Floating-Point vs. Fixed-Point 
General-purpose CPUs usually handle two 
types of numbers: integers and floating- 
point values. In many cases, however, they 
don’t explicitly support floating-point cal- 
culations in hardware, because most tasks 
don’t require them. You may need a spe- 
cial floating-point chip (e.g., a 387 or a 
68882) to handle floating-point values. 

For the same reason, DSPs often come 
in two flavors: fixed-point and floating- 
point. Fixed-point DSPs are a cross be- 
tween integers and real numbers that pro- 
vides only a fixed level of precision. 

An example of such a fixed-point system 
is the U.S. monetary system. Dollars can be 
broken down into numbers that have only 
two decimal points of precision. The com- 
plexity of the mathematics is closer to in- 
teger arithmetic than floating-point arith- 
metic, because the fractions can be easily 
converted into integers. For instance, you 
can do integer arithmetic on U.S. currency 
by converting everything into cents. 

Floating-point chips must be able to 
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handle wide ranges of numbers, though. 
They must be able to multiply 1x10” with 
2.2x10" and find the right value. This re- 
quires large shifters that can shift the bits 
of the two numbers until they align cor- 
rectly for the operation. This takes space 
and adds plenty of complexity. 

Why have fixed-point chips? Most DSP 
operations involve plenty of fractions, and 
the fixed-point representation makes life 
easier for the programmer who would rather 
not convert everything to integers. In fact, 
overflows and underflows are the only big 
differences a programmer will find between 
fixed-point and floating-point DSPs. The 
programmer must watch for numbers that 
get too big or too small and trap for them. 


Interrupts 

One of the most important jobs of a DSP is 
processing data in real time. It must be 
able to handle information from an instru- 
ment like a seismometer while the ground 
is still shaking. 

Standard CPUs come with an interrupt 
structure, which allows other hardware to 
get the CPU’s attention. These general sys- 
tems are designed to be used in many ways. 
When the interrupt is called, the state of the 
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system is saved, and the process jumps to 
a new location determined by the operating 
system. When the work is done, the inter- 
rupt system restores the old state and gets 
back to work. This is simple, and it handles 
all possible cases that come its way. 

The DSP, however, must handle incom- 
ing data without slowing down the pro- 
cess. That’s why most DSPs come with a 
special interrupt mode that inserts a small 
number of instructions into the standard 
instruction stream. For example, the Mo- 
torola 56000 allows the programmer to 
define two general instructions as the in- 
terrupt. When the signal arrives, the two in- 
structions are placed at the top of the pipe- 
line, and the standard instruction stream is 
held up for two instructions. Usually, these 
instructions are enough to grab a value 
from one of the serial ports and store it in 
memory. This type of interrupt can be dan- 
gerous, because the two instructions can 
do anything to the state of the machine. 
When it’s used correctly, though, it keeps 
the data coming in as fast as possible. 


A Workhorse for the 1990s 
DSPs are becoming popular for attacking 
problems that involve heavy number crunch- 


ing. The architecture is tuned to get data 
onto the chip, do multiplication and accu- 
mulate instructions, and get the data off- 
chip as fast as possible. The modifications 
in the standard processors’ instruction set, 
data buses, and interrupt structure are sim- 
ple and general enough to be useful in a 
number of nonsignal applications (e.g., 
matrix multiplication or neural networks). 

In one sense, DSPs are the last thriving 
remnants of CISC architectures. The chips 
include many special-instruction formats 
that are useful for frequently occurring in- 
structions. These features are difficult for 
a compiler to use efficiently in all cases, 
but this is not a limitation because DSPs 
spend most of their time in small loops 
that programmers can hand-tune in as- 
sembly code. 

An ordinary computer can be converted 
into a multimedia master by adding a dig- 
ital signal-processing chip. The populari- 
ty of multimedia applications could make 
DSPs as popular in the 1990s as math co- 
processors were in the 1980s. mf 


Peter Wayner is a consulting editor for 
BYTE. You can contact him on BIX as 
“pwayner.” 


STATE OF THE ART, Signal Computing 


A PLATFORM FOR 
SIGNAL COMPUTING 


Analog Devices couples a reprogrammable signal processor 
with a standard API to create an open platform 


ignal computing integrates dynam- 

ic, new real-time data types into the 

static world of data processing. In 

turn, these data types will move your 
computer interface beyond the GUI to the 
point where even neophytes will feel com- 
fortable using a computer. 

Analog Devices has developed an open, 
reprogrammable signal-processing envi- 
ronment capable of manipulating these 
real-time data types. This is an environ- 
ment for real-time, signal-based software 
applications, developed and run on a re- 
programmable signal processor under the 
control of host-based applications. Within 
this environment, applications-specific sig- 
nal I/O ports acquire and generate the real- 
time signal I/O. 

The primary data types of this environ- 
ment are voice, audio, wired and wireless 
communications streams, and video. With 
such a platform, you can bring real-time 
multimedia applications (e.g., digital pho- 
tography, high-speed image compression, 
language translation, and teleconferenc- 
ing) to the desktop. 


Multimedia Signals 
Many of the above applications already 
exist. However, each one usually has its 
own proprietary plug-in hardware plat- 
form, with its own signal processor, mem- 
ory, signal I/O peripherals, proprietary 
host interface, proprietary applications 
monitor, and proprietary applications code. 
Because these were developed as stand- 
alone, fixed-function applications, it is ex- 
pensive and difficult to incorporate them 
into other personal computer—based soft- 
ware. 

A low-cost signal processor (support- 
ed by inexpensive signal I/O ports and 
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open software standards), coupled with an 
open software environment, can generate 
the widest developer base and, conse- 
quently, the greatest-variety of applica- 
tions. Software developers will be able to 
license these turnkey signal-processing li- 
braries from entrepreneurial signal-pro- 
cessing experts and incorporate the new 
data types into their own applications. 


Soft Modems 

Two years ago, Ken Kretchmer of Action 
Consulting (Palo Alto, CA) described a 
soft modem as a modem that uses a total- 
ly RAM-based signal processor to imple- 
ment all the controller functions (e.g., 
V.42bis and the Hayes AT commands) on 
the same signal processor that runs the 
V.32bis modulation algorithms. Kretch- 
mer projected that the modem would be 
the first signal-processing application to 
achieve true software status, where all the 
modem’s software algorithms resided in 
memory. 

Digicom Systems (Milpitas, CA) held 
the same view and pioneered in the use of 
reprogrammable signal processors in high- 
performance modems. Last June, at the 
New York PC Expo, Digicom became the 
first modem manufacturer to announce a 
soft modem, the first signal-computing 
software product providing throughput of 
up to 57,600 bps and using the 14,400-bps 
V.32bis with V.42bis data compression. 
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In addition, the modem includes Group 3 
V.29 fax, all modem and fax fallback stan- 
dards, and controller functions, such as 
the AT+Voice command set, MNP level 5, 
and class 1 fax. For less than $30, person- 
al computer integrators are able to pur- 
chase all this software and two chips: a 
reprogrammable signal processor and a 
modem front end. 


Beyond Data Processing 
Analog Devices sees signal computing as 
an industry model to enable the open de- 
velopment and use of real-time signal-pro- 
cessing applications. It has defined inex- 
pensive, reprogrammable chip sets and 
developed a nucleus of IAVs (indepen- 
dent algorithm vendors) to leverage sig- 
nal computing’s development. 

Two unique features of the signal-com- 
puting chip set are its reprogrammability 
and extensibility. First, Analog Devices’ 
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product is KAM-based. 1 he modem can be 
reprogrammed, patched, enhanced, and up- 
graded through software. This is especially 
valuable to closed hardware platforms (e.g., 
laptops and palmtops). Second, the chip set 
is based on an inexpensive, general-pur- 
pose 16-bit DSP (digital signal processor), 
and the companion chip is a fully integrat- 
ed modem front end. At an integrator’s dis- 
cretion, additional signal I/O ports (pro- 
viding voice I/O, stereo audio I/O, and so 
forth) can be added to the base chip set. 
Voice-recognition, text-to-speech conver- 
sion, or music-synthesis software can also 
be added to incrementally extend the base 
chip set’s capabilities. Speech recognition, 
music synthesis, and image compression 
become software products that essentially 
run on your fax/modem. 


Signal-Processing Data Types 

The major difference between signal pro- 
cessing and data processing is the real- 
time nature of the data being processed. 
Real-time data is simply a signal (e.g., an 
audio waveform) that is sampled or gen- 
erated in real time. Signal processors are 
designed to handle the unique numerical 
requirements of processing real-time data 
and to interface with real-time signal-ac- 
quisition components. 

Modems compress data into real-time 
communications streams to fit narrow- 
band channels. Similarly, high-speed wired 
networks, infrared LANs, mobile radios, 
and satellite RF links use signal proces- 
sors to compress and decompress data in 
real time. If a real-time processor fails to 
process a millisecond or two, the conse- 
quence is lost signals, resulting in garbled 
voice or corrupt data. 

Voice applications (e.g., cellular phones 
and digital answering machines) require 
real-time signal I/O at frequencies as high 
as 8 kHz for compression and up to 16 
kHz for speech recognition (to capture the 
high-frequency voice tones, such as an 5). 
Audio applications (e.g., digital stereo play- 
back, music synthesis, and digital stereo 
recording) require real-time signal I/O 
at frequencies as high as 48 kHz to cap- 
ture high-frequency audible signals, such 
as crashing cymbals. And motion video 
requires real-time signal I/O at frequen- 
cies between | and 30 MHz, depending 
on the size and update frequency of the 
image. 

Many other real-time signal-processing 
systems (e.g., noise cancellation and en- 
cryption) require lossless I/O. The loss- 
“ess characteristic of real-time signal I/O 
laces burdens on the signal processor, es- 
pecially on its architecture, I/O peripherals, 
and interrupt capabilities. The signal pro- 
cessor needs the computational bandwidth 
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to process the signal data within the sam- 
pling frequency. Although data processors 
can handle some real-time signal applica- 
tions, they were not designed for these 
tasks. 


Signal I/O 

To incorporate real-time data types into 
a personal computer, you need real-time 
signal I/O ports or mixed-signal periph- 
erals. Such devices preserve the informa- 
tion contained in a signal while trans- 
forming its format into one appropriate 
for the next stage in its journey—for either 
processing or transmission. They must be 
capable of sampling the signal stream at a 
frequency appropriate for the data type 
with the necessary accuracy or fidelity. 
They must also be able to preprocess the 
acquired data. When generating signal 
output, the devices should postprocess the 
digital data stream and output signals with 
a data type—appropriate frequency, accu- 
racy, phase, and gain. When used in sig- 
nal-computing environments, the devices 
should easily interface with the signal pro- 
cessor. 

Third-party companies are integrating 
audio-band I/O ports into the A/D-signal- 
computing platform. Wireless communi- 
cations and video I/O ports (e.g., base- 
band I/O and infrared/RF components, 
video-capture boards and scanners, and 
real-time video-compression components) 
will be designed into the integrated per- 
sonal computers of the future. Note that 
the performance of these signal I/O ports 
is just as important to signal computing 
as the performance of the keyboard, disk 
drive, mouse, and display is to personal 
computing. 

PSTN (public switched telephone net- 
work) applications on a personal comput- 
er (e.g., modem, fax, and speech) require a 
direct-interface, single-chip, echo-cancel- 
ing front end. This component must handle 
standard sampling frequencies and include 
on-chip resampling/interpolation filters 
for real-time signal synchronization and 
phase adjustment. 

Voice I/O applications require a linear 
voice-band codec that provides a direct 
interface with a signal processor, a micro- 
phone, and an amplified speaker. The 
codec should offer on-chip antialiasing 
and anti-imaging filters and good group- 
delay characteristics that simplify accous- 
tic echo cancellation when the signal is to 
be broadcast in mobile (i.e., wireless) com- 
puter environments. 

Audio applications require a single- 
chip, 16-bit, stereo audio-band codec, 
which provides a direct interface with the 
data processor or signal processor; stereo 
line-level inputs and outputs; stereo mi- 


crophone-level inputs; and speaker out- 
puts. The audio codec should have on- 
chip programmable gain amplifiers and 
automatic-calibration circuitry, as well as 
support the full spectrum of personal com- 
puter audio-sampling frequencies between 
8 and 48 kHz. 


ideo I/O ports 
will be designed into 
the integrated 
personal computers 
of the future. 


Signal I/O ports must also be fully in- 
tegrated and designed to provide the func- 
tionality required by a wide variety of ap- 
plications for a given signal data type. 
They will be fabricated with CMOS-pro- 
cess technology at both 5-volt and 3-V 
levels to enable the signal ports’ integration 
into the chip sets of the future. 


Signal-Computing Applications 
Today, signal processors are pervasive in 
communications systems such as high-per- 
formance modems, digital mobile radio, 
digital cordless telephony, satellite com- 
munications, and videophones. Narrow- 
band communications channels require the 
compression and reconstruction of data, 
and signal processors are the engines. 

Voice and data compression are also 
used in such applications as voice mail, 
digital answering machines, and data com- 
pression for floppy/hard-disk conserva- 
tion. Real-time data types take up a lot of 
hard disk space. Even compressed, a min- 
ute of motion video can fill a hard drive. 
More efficient algorithms are being de- 
veloped to reduce channel usage and data- 
storage costs. 

Digicom Systems, Specom (Santa Clara, 
CA), and Lernout & Hauspie Speech Prod- 
ucts (Ieper, Belgium) are the first IAVs to 
provide data- and voice-compression tech- 
nology for communications applications 
within the signal-computing environment. 
Digicom provides modem and fax capa- 
bility. Specom provides CELP (code ex- 
cited linear prediction) and TIA IS-54 
VSELP voice compression capability. And 
LHSP provides SBCELP. 

Signal processors are pervasive in speech- 
recognition and speech-synthesis applica- 
tions (e.g., voice navigation, hands-free/ 
eyes-free control, security access control, 
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The Signal-Computing 


he essence of signal computing 

lies in the integration of three fun- 

damental competencies into a sig- 

nal-processing solution: signal 
I/O ports, signal-processing software, 
and a reprogrammable, digital signal 
processor. 

The signal I/O ports capture or gen- 
erate the applications’ signals, con- 
verting them back and forth between 
the analog and digital domain. In the 
figure, there are three signal I/O ports: 


Environment 


a SUI (Sound User’s Interface), which 
provides voice- or audio-quality I/O; 
a PSTN (public switched telephone 
network) interface, which connects to 
the phone lines for modem and fax I/O; 
and a TDMA (Time Division Multiple 
Access) peripheral, which connects to 
infrared/RF transmit and receive com- 
ponents for wireless communications. 

The algorithm software performs the 
mathematically complex and intensive 
signal-processing algorithms. The fig- 


ure shows algorithm code for a fax/data 
modem, MPEG audio compression, 
JPEG image compression, and speech 
recognition. When not in use, the code 
resides on the personal computer. It’s 
downloaded as necessary by the host 
processor. 

The signal coprocessor provides the 
mathematical horsepower to process 
the signal-computing algorithms. It has 
support circuitry to interface with the 
signal I/O ports, as well as with a host 


THE PERSONAL COMPUTER-DSP CONNECTION 


Operating/Windowing 
system 


Interprocessor 
communications 
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MPEG 
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Using the abstraction of a multimedia API, the signal processor and its algorithms are invisible to host-based applications. 
Interprocess communications between the signal-computing platform and the host are bus-independent, letting the signal 


processor reside on a motherboard, on an expansion card, or in a peripheral device. 
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processor. The signal-computing ap- 
plications’ RAM sits alongside the 
signal processor, enabling the recon- 
figurability of the signal-computing en- 
gine. 

As signal-computing applications 
grow and become more complex, there 
will be an increasing need for a real- 
time multitasking kernel and mixed- 
signal device drivers. Spectron Mi- 
crosystems (Santa Barbara, CA) has 
pioneered in the standardization of the 
DSP operating system with SPOX. The 
company has also defined and stan- 
dardized an OSPA (open signal copro- 
cessing architecture) for layered host/ 
DSP interprocessor communications. 
These software tools decouple the al- 
gorithm code from the signal proces- 
sor and the signal I/O port hardware 
and enable the portability of applica- 
tions software. OSPA and SPOX work 
with signal processors from Analog 
Devices, Motorola, and Texas Instru- 
ments. 

On the applications side of the sig- 
nal-computing environment are the tra- 
ditional components of most personal 
computer applications: the host micro- 
processor, its memory, the operating/ 
windowing system, and applications 
software. In the figure, a data-com- 
munications application and a voice- 
annotation application access the signal- 
computing software. These applications 
can be databases, spreadsheets, word 
processors, or yet-to-be-developed pro- 


grams. 

APIs are also needed to provide ap- 
plications with standardized calls to 
the signal processor. The Interactive 
Multimedia Association, Spectron Mi- 
crosystems (with its MINT [Media In- 
tegration] architecture), and other ap- 
plications groups are standardizing 
multimedia and signal-processing 
APIs for use by independent software 
developers. These standards will be 
determined by the free-market inter- 
action of the IAVs and the software 
developers. 


and text-to-speech conversion). Speech 
systems can have various vocabularies, 
languages, training requirements, and ac- 
curacy rates; be continuous or discrete; 
and use options such as word spotting, lex- 
ical and syntactic analysis, and semantic 
processing. Speech recognition and speech 
synthesis rely on constantly evolving al- 
gorithms for vector quantitization, acous- 
tic and language models, neural networks, 
hidden Markov models, and expectation 
maximization. In a signal-computing en- 
vironment, you can incorporate these im- 
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imaging. 


provements with shrink-wrapped software 
upgrades. 

LHSP provides speech-recognition and 
speech-synthesis technology for speech 
I/O integration within the signal-computing 
framework (see the text box “Digitally 
Speaking” on page 160). LHSP provides 
speaker-independent and speaker-adap- 
tive American English, German, French, 
Spanish, Dutch, and Korean speech-recog- 
nition and speech-synthesis systems, And 
it is working on systems for Japanese, Ital- 
ian, British English, South American Span- 
ish, and tonal languages (e.g., Mandarin 
and Cantonese). All the systems use a text- 
to-speech conversion tool called DEPES 
(Development Environment for Pronun- 
ciation Expert Systems) for rapid language 
development. 


Digital Audio 

Signal processors are widely used in audio 
and electronic music. Music synthesizers 
use signal processors as envelope genera- 
tors and as digital oscillators to create var- 
ious voices and such effects as tremolo 
and pitch blending. One of the first appli- 
cations of signal processors was in pro- 
fessional audio for delay and artificial re- 
verberation. Now DPSs are also being 
applied in consumer audio for such func- 
tions as surround-sound decoding and 
equalization. In the near future, signal pro- 


cessors will be found in car audio systems 
for canceling noise or improving stereo 
imaging. 

One IAV working in this area is Eu- 
Phonics, whose first algorithm toolkit will 
be an implementation of Dolby Labora- 
tory’s AC-2 audio-compression algorithm, 
which provides a 6-to-1 reduction of stor- 
age requirements for CD-quality audio, 
with no audible degradation of the sound. 
EuPhonics also plans to offer unique dig- 
ital synthesis algorithms that will improve 
the quality of FM synthesis components 
that are used in popular add-in cards, such 
as SoundBlaster from Creative Labs (San- 
ta Clara, CA). 


Digital Imaging 

Signal processors are widely used in stat- 
ic imaging (e.g., graphics accelerators and 
digital photography), CAT scanners, mag- 
netic-resonance imaging, satellite imag- 
ing, and bar coders. They are also used in 
real-time imaging applications, such as 
videophones, radar, and sonar. 

One IAV with offerings in the imaging 
field is Xing Technology (Arroyo Grande, 
CA), which will initially provide CCITT 
JPEG (Joint Pictures Experts Group) im- 
age-compression algorithms and CCITT 
MPEG (Motion Pictures Experts Group) 
audio-compression algorithms. Xing Tech- 
nology is active on the JPEG, MPEG, and 
Interactive Multimedia Association com- 
mittees and has developed its software 
products using a scalable compression ar- 
chitecture. In real-time video, the viewed 
size, compressed size, refresh rate of the 
image, and quality of the image can be 
scaled to the computing resources avail- 
able. Future IAVs will offer print- and cur- 
sive-handwriting-recognition software, as 
well as graphics and digital-imaging al- 
gorithms. 


Signals in Your Future 

Analog Devices created an open signal- 
computing platform to move the power of 
DSP beyond proprietary constraints. It’s 
an environment for real-time, signal-based 
software applications. With this platform, 
you can bring real-time multimedia com- 
puting to your desktop. 

Analog Devices has defined low-cost, 
reprogrammable chip sets for this plat- 
form. But, as with the original personal 
computer, the success of the platform de- 
pends on the imagination and hard work of 
applications developers. m 


Tim Counihan is the strategic marketing 
manager for signal processors at Analog 
Devices (Norwood, MA). You can contact 
him on BIX c/o “editors” or on the Inter- 
net at tim.counihan@analog.com. 
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